0.1 Why R?

  • Open source
  • World wide community
  • Free
  • Used by organizations like Google, New York Times, Financial Times, NPR, Urban Institute

Let’s make sure we understand installing R, and calling libraries.

0.2 Why R? (2)

library(ggplot2)

library(plotly)

myplot <- ggplot(subset(GSS2014, 
                        !is.na(health)), # health is not missing
                 aes(x = health, # x is health
                     y = coninc, # y is income
                     color = health)) + # color is health
  geom_jitter() + # draw jittered points
  geom_boxplot() + # draw boxplots
  labs(title = "Health And Income",
       y = "Income in Constant $",
       x = "Health")

ggplotly(myplot)

0.3 What are we doing???

  • Start R
  • Get some data in it
  • A few descriptive statistics
  • Some graphs

0.4 Using General Social Survey (GSS) for Example

  • General Social Survey: Nationally representative sample collected annually or biannually from 1972 to 2010.

  • When downloading data, download data from CANVAS to Mac or Windows desktop, then start RStudio to open with R.

0.5 Data Are Just Rows and Columns

We use both the codebook and data set.

0.6 Scripting (R Syntax)

0.7 Get Data (Script)

# local file

# make sure you are in the right directory
# Menu: Session | Set Working Directory

load("GSS2014.Rdata") 

Menu option as well

  • Loading your data is sometimes the hardest part.
  • Load is for data that is ALREADY IN R FORMAT!
  • Pay attention to WHERE your data and scripts live.
  • Note that R uses forward slashes: /

0.8 The R Interface

0.9 Measures of Central Tendency

  • What are the mean, median, and mode? Why are they different?
summary(GSS2014$coninc)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
##    369.5  17551.2  33255.0  48603.3  60967.5 160742.2      224
library(psych) # to load psych

describe(GSS2014$coninc)
##    vars    n     mean       sd median  trimmed      mad   min      max    range
## X1    1 2314 48603.29 43340.89  33255 40902.37 28760.59 369.5 160742.2 160372.7
##    skew kurtosis     se
## X1 1.42     1.34 900.98

0.10 We End With a Graph

hist(GSS2014$coninc,
     col = "blue") # histogram of the income variable

0.11 And Another

pie(table(GSS2014$sex),
    labels=c("male", "female"),
    col = c("blue", "gold"))

0.12 Questions?